A New Method for Rule Finding Via Bootstrapped Confidence Intervals
نویسنده
چکیده
Association rule discovery in large data sets is vulnerable to producing excessive false positives, due to the multiple inference effect. This paper first sets this issue in precise mathematical terms and presents some analytical results. These show that a common concern regarding effects of filtering is not as problematic as had been previously thought. The analytical results also shed new light on a recently proposed method for dealing with the problem. The paper then proposes a new method based on simultaneous confidence intervals, computed via a novel use of the statistical bootstrap tool. The proposal here differs markedly from previous bootstrap/resampling approaches, not only in function but also in basic goal, which is to enable much more active participation by domain experts.
منابع مشابه
Confidence Intervals for Lower Quantiles Based on Two-Sample Scheme
In this paper, a new two-sampling scheme is proposed to construct appropriate confidence intervals for the lower population quantiles. The confidence intervals are determined in the parametric and nonparametric set up and the optimality problem is discussed in each case. Finally, the proposed procedure is illustrated via a real data set.
متن کاملExact maximum coverage probabilities of confidence intervals with increasing bounds for Poisson distribution mean
A Poisson distribution is well used as a standard model for analyzing count data. So the Poisson distribution parameter estimation is widely applied in practice. Providing accurate confidence intervals for the discrete distribution parameters is very difficult. So far, many asymptotic confidence intervals for the mean of Poisson distribution is provided. It is known that the coverag...
متن کاملA SAS Macro for Calculating Bootstrapped Confidence Intervals About a Kappa Coefficient
Cohen’s kappa coefficient has become a standard method for measuring the degree of agreement between two raters. Confidence intervals for kappa and weighted kappa based on its asymptotic variance are available in the SAS system through the FREQ procedure. However, this variance can become unreliable as sample size decreases or as kappa approaches unity. This paper presents a SAS macro for calcu...
متن کاملAn Application of Profile-Likelihood Based Confidence Interval to Capture-Recapture Estimators
In recent years, more robust methods 'of estimating the size of a closed population (N) from capture-recapturedata have been developed. However, interval estimation for N has seen few practical developments. The usual approach for constructing a confidence interval, known as a Wahl confidence interval, is based on the assumption of asymptotic normality. It is well bown that the small sample dis...
متن کاملEstimating Probabilities of Default
We conduct a systematic comparison of confidence intervals around estimated probabilities of default (PD), using several analytical approaches from large-sample theory and bootstrapped small-sample confidence intervals. We do so for two different PD estimation methods—cohort and duration (intensity)—using twenty-two years of credit ratings data. We find that the bootstrapped intervals for the d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008